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ABSTRACT 

This work describes a full Bayesian analysis of the Nearby Universe as traced by galaxies of 
the 2M++ survey. The analysis is run in two sequential steps. The first step self-consistently 
derives the luminosity dependent galaxy biases, the power-spectrum of matter fluctuations 
and matter density fields within a Gaussian statistic approximation. The second step makes a 
detailed analysis of the three dimensional Large Scale Structures, assuming a fixed bias model 
and a fixed cosmology. This second step allows for the reconstruction of both the final density 
field and the initial conditions at z = 1000 assuming a fixed bias model. From these, we 
derive fields that self-consistently extrapolate the observed large scale structures. We give two 
examples of these extrapolation and their utility for the detection of structures: the visibility 
of the Sloan Great Wall, and the detection and characterization of the Local Void using DIVA, 
a Lagrangian based technique to classify structures. 

Key words: methods: data analysis - methods: statistical - galaxies: statistics - large-scale 
structure of Universe 


1 INTRODUCTION 

Over the last decades, the wealth of galaxy redshift catalogues has 
stupendously increased. Nowadays millions of galaxies with pre¬ 
cision positioning on the sky and accurate redshifts are available 
and have to be handled and processed on a routinely basis. For ex¬ 
ample the Sloan Digital Sky Survey (SDSS, e.g. [York et al.|2000[ 
[Abazajian et al.|200^|Ahn et al.|2014| | provides millions of galaxy 
redshifts and the Six Degree Field Galaxy Redshift Survey (6DF- 
GRS |Jones et al.|2()09] l, covering the southern sky, contains nearly 
70 000 galaxies with accurate redshift measurements. While the 
amount of data has steadily increased, progress in the development 
of modern data analysis techniques has only been made in recent 
years. These advances are particularly crucial to interpret evermore 
complex data sets where time evolution of objects (e.g. star for¬ 
mation rate), non-linear dynamics (e.g. galaxy cluster formation), 
foreground subtraction as well as systematic selection effects be¬ 
come increasingly important. 

Inferring 3d density fields in a formal and rigorous Bayesian 
framework has several advantages. The first and foremost advan¬ 
tage is that all observational aspects are treated self-consistently 
yielding inferred 3d density fields that do not require any post¬ 
analysis correction. The second advantage is that the model yields 
more information on the density field than what is readily usable 
in catalogues. For example the tidal field created by visible large 
scale structures may trigger the collapse in other unobserved area of 
the Universe. This can raise the interesting possibility of predicting 
where structures (such as walls, filaments, clusters and voids) form. 


The actual presence of such inferred structures can then be tested 
via dedicated observations a posteriori. Specifically this work fo¬ 
cuses on developing a probabilistic structure predictor. We will 
concentrate on the void aspect in difficult unobserved regions like 
the Galactic plane. To characterize these voids we will make use 
of the previously presented DIVA framework (|Lavaux & Wandelt| 

|20T0l ). 

Particularly successful approaches to solving such ill-posed 
inverse problems rely on the Bayesian formulation of parameter in¬ 
ference. We define a forward data model that indicates how a con¬ 
tinuous three-dimensional density field is transformed into a set of 
predicted observables which are then directly compared to data. In 
our case the observable is the number density of galaxies in co¬ 
moving space. Conversely, given the position of galaxies we may 
infer this density field provided it is decomposed on an adequate 
finite basis. In this context, the data model should include every¬ 
thing that may happen between the density field to the detection of 
a galaxy by an observer, which includes for example photon detec¬ 
tion, galaxy detection efficiency. The full problem cannot be solved 
in its entirety but for sufficiently well constructed samples only ba¬ 
sic selection criterion, such as flux limitation and overall redshift 
completeness, are important. 

Even in this optimistic context, the parameter inference prob¬ 
lem is daunting: for typical inferences we need to treat on the order 
of 10^ - 10^ highly degenerate parameters comprised typically of 
density per volume elements and power spectrum values. There ex¬ 
ists a wealth of literature on the derivation of power spectra and cor- 
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relation functions from noisy and incomplete data (see e.g. |Land^ 
|& Szalay|1993[|Tegmark et al.|2004[[Percival|2005| ). However they 
never fully grasp the complexity of the posterior of a blind analysis 
of power spectra in data. More recent developments, notably stim¬ 
ulated by the requirement of the Cosmic Microwave Background 
community (see e.g. |Eriksen et al.|2004]| Jewell et al.|2004[|Wandelt| 
|et al.|2004) , have pushed the limits of density and power-spectrum 
reconstruction for gala xy redshift catalogue (p'asche et al.||2010[ 

[Jasche & Kitaura|2010[ [Jasche & Lavaux|2015| K 

All the aforementioned techniques still require a good knowl¬ 
edge on how tracers have been selected. To have the largest, deep¬ 
est and cleanest galaxy redshift compilation we propose to use the 
2M++ ( [Lavaux & Hudson|2011[ ) galaxy compilation. This survey 
offers a near full sky coverage at a magnitude K 2 M++ < 11-5 and 
above 50% coverage for K 2 M++ < 12.5. Evolutionary effects of 
galaxies were corrected in average and the selection is done for 
a consistent population of galaxy. Finally, redshift completeness 
maps are provided for the two magnitude selections. 

The data application presented in this works builds upon our 
previously developed Bayesian data analysis algorithms ARES (Al¬ 
gorithm for REconstruction and Sampling, [Jasche & Wandelt] 
[2013b[l and BORG (Baye sian Origin Reconstruction from Galax- 
ies, [Jasche & Wandelt[[2013a[ l. Both these algorithms perform a 
Bayesian analysis of the 3d distribution of galaxies albeit with dif¬ 
ferent assumptions on the noise and on the dynamics of the tracers. 
This work is structured as follows. In Sectionj^ we give a descrip¬ 
tion of the 2M++ galaxy compilation which is the data that we are 
aiming at modelling. Then in Section]^ we present the pipeline and 
give a reminder on the working of the ARES and BORG models and 
algorithms. In Sectionwe present the setup and the convergence 
tests of the Bayesian inference. In Section|^ we analyse the results 
in the context of cosmography and structure classification. Finally, 
in Section [6] we conclude. 


2 THE 2M++ SURVEY 

In this work we follow a similar procedure as described in [Jasche[ 
[et al.[ ( [2010[ ) and more recently in [Jasche et al.[ ( [2015[ ), by apply¬ 
ing the BORG algorithm to the 2M++ galaxy compilation ( [Lavaux[ 
[& Hudson|2011[ l. The 2M++ is a superset of the 2MASS Redshift 
Survey (2MRS, [Huchra et al.[[2012] l, with a greater depth and a 
higher sampling than the IRAS Point Source Catalogue Redshift 
Survey (PSCZ, [Saunders et al.[[2000] l. The photometry is based 
primarily on the Two-Micron-All-Sky-Survey (2MASS) Extended 
Source Catalogue (2MASS-XSC, [Skrutskie et al.[2006[ t, an all-sky 
survey in the /, H and Ks bands. Redshifts in the Ks band of 
the 2MASS Redshift Survey (2MRS) are supplemented by those 
from the Sloan Digital Sky Survey Data Release Seven (SDSS- 
DR7, [Abazajian et al.[[2ob^ , and the Six-Degree-Field Galaxy 
Redshift Survey Data Release Three (6dFGRS, [Jones et al.|2009[ l. 
Data from SDSS were matched to that of 2MASS-XSC using the 
NYU-VAGC catalogue ( [Blanton et al.'p005l l. As the 2M++ draws 
from multiple surveys, galaxy magnitudes from all sources were 
first recomputed by measuring the apparent magnitude in the Ks 
band within a circular isophote at 20 mag arcsec^ . Following a pre¬ 
scription described in [Lavaux & Hudson] ( [2011[ ), magnitudes were 
then corrected for Galactic extinction, cosmological surface bright¬ 
ness dimming and stellar evolution. After corrections the sample 
was limited to K 2 M++ < 11.5 in regions not covered by the 6dFGRS 
or the SDSS, and limited to K 2 M++ < 12.5 elsewhere. Other relevant 
corrections which were made to this catalogue include accounting 


for incompleteness due to fibre-collisions in 6dF and SDSS, as well 
as treatment of the zone of avoidance (ZoA). Incompleteness due 
to fibre-collisions was treated by cloning redshifts of nearby galax¬ 
ies within each survey region as described in ILavaux & Hudson[ 

The treatment of the ZoA in the 2M++ will be ignored for 
this work as the Bayesian machinery naturally and self-consistently 
accounts for incomplete observations. The galactic plane will thus 
be simply obscured, the objects marked as cloned removed from 
the catalogue and the completeness set to zero in that region. The 
ZoA is defined in the 2M-I--I- as the region delimited by \b\ < 5° for 
/ > 30° and / < 330°, and \b\ < 10° for / < 30° or / > 330°. 

The galaxy distribution on the sky and the corresponding se¬ 
lection at K 2 M++ <11.5 and 11.5 < K 2 M++ < 12.5 are given in Fig¬ 
ure The top row shows the data used in our analysis. The lower 
row show the redshift incompleteness, i.e. the number of acquired 
redshifts versus the number of targets, for the two apparent mag¬ 
nitude bins. We note that the galactic plane clearly stands out and 
that the incompleteness is evidently inhomogeneous and strongly 
structured. 

In addition to the target magnitude incompleteness, and the 
redshift angular incompleteness, one may also worry about the de¬ 
pendence of the completeness with redshift. This is not a problem 
for the lower K 2 M++ < 11-5 which is essentially 100% complete. 
We do not expect much effect in the fainter magnitude bins as the 
spectroscopic data come from SDSS and 6dFGRS which have both 
an homogeneous sampling and have fainter magnitude limits as the 
2M++. 

We account for radial selection functions using a a standard lu¬ 
minosity function 0(L) proposed by [Schechter[ ( [1976'] ). Using this 
function we can deduce the expected number of galaxies in the ab¬ 
solute magnitude range, observed within the apparent magnitude 
range of the sample at a given redshift. The a and M* parameters 
are given for the Ks-band in the line labeled ’jZ?| > 10, X < 11.5” 
of the table 2 of [Lavaux & Hudson] ( [201 1[ ), i.e. a = -0.94, 
M* - -23.28. The target selection completeness of a voxel, in¬ 
dexed by p, is then 


Vp L”“ <D(L)dL 


( 1 ) 


where "Vp the co-moving coordinate set spanned by the voxel, and 
Vp - fy The full completeness of the catalogue is derived 
from the product of and the map corresponding to the considered 
apparent magnitude cut given in the bottom row of the Figure 
after its extrusion in three dimensions. 

Finally, we note that our analysis accounts for luminosity 
dependent galaxy biases by following the approach as described 
in [Jasche et al.[ ( [2015[ ). In order to do so the galaxy sample is 
subdivided into 3 equidistant bins in absolute X-band magnitude 
in the range -25 < X 2 M++ < -21. The galaxy sample is fur¬ 
ther splitted into two sub-sets depending on the apparent magni¬ 
tude: if X 2 M++ < 11.5 it belongs to the sample one, otherwise, 
11.5 < X 2 M++ < 12.5 it belongs to the sample two. The bias in each 
of these bins is kept constant to greatly reduce the time complex¬ 
ity burden, at the cost of losing a full marginalization according to 
these parameters. The determination of these values is left to ARES. 
The mean density of tracers, and thus the Poisson noise amplitude, 
in each of these bins is sampled. 

As will be described in more detail below, splitting the galaxy 
sample permits us to treat each of these sub-samples as an individ- 
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Figure 1. We show here the 2M++ galaxy compilation. All plots uses a Galactic coordinate system. Top row: the 69081 galaxies that we used in the ARES 
and BORG analysis. Each galaxy is colour coded according to its apparent redshift. Bottom row: Redshift incompleteness mask for the two magnitude cuts 
K2M++ <11.5 and 11.5 < K2M++ < 12.5. Blue corresponds to zero completeness, which is equivalent in our scheme to be masked out. 


ual data set, with its respective selection effects, biases and noise 
levels. 


3 METHODOLOGY 

In this section we give a brief introduction to the Bayesian inference 
framework BORG (Bayesian Origin Reconstruction from Galaxies). 


3.1 The ARES framework 

The ARES framework is a full Bayesian large scale structure infer¬ 
ence method targeted at precision recovery of cosmological power- 
spectra from three dimensional galaxy redshift surveys. Specif¬ 
ically it performs joint inferences of three dimensional density 
fields, cosmological power spectra as well as luminosity dependent 
galaxy biases and corresponding noise levels for different galaxy 
populations in the survey (jJasche et al.||2010| |Jasche & Wandelt| 
|2013b| l. 

The complete problem solved by ARES has many parameters. 
In the case of a single population, the data model implemented in 
ARES corresponds to the following: 


instrumental noise. The noise is assumed to be Poissonian but ap¬ 
proximated by a Gaussian distribution and neglecting the influence 
of the density fluctuations themselves. Thus we have 

= Midi, (3) 

with 6fj = 1 is one for i - j and zero otherwise. Finally, we 
add an isotropic Gaussian prior to df. All the details of the general 
model and the posterior formulation are given in |Jasche & Wan-| 
|delt| ( |20I3b] ). The linear bias model should be generally adequate to 
model the largest scale density fluctuations. In that regime, through 
Taylor expansion, all bias models are equivalent. However this is 
not the case at the smallest scales considered here (~ 23h~^ Mpc ) 
though we think that we should not be strongly biased by this 
assumption. Effectively, we expect the signal-to-noise to ratio of 
the measurement of density modes to peak at intermediate scales 
(~ I0h~^ Mpc ) and decreasing sharply both at small (for Poisson 
sampling reasons) and large (for selection reasons) scales. Thus the 
measured bias should actually represent the one at this typical scale. 
We finally note that the final confirmation that the bias model is not 
causing problems is the a posteriori confirmation that the recovered 
power spectrum is in agreement on large scales. 

To summarize the posterior from which we want to draw sam¬ 
ples is 


Ni = NRiil + bDiSi) + €i, (2) 

with Ni the number of galaxies in the voxel /, N the mean density 
of the galaxy population, R/ the overall linear response operator 
of the survey (i.e. the redshift and the target completeness), b the 
population bias, 79/ the density growth factor in the voxel /, 6i the 
linear density at a reference redshift in the voxel i and 6/ a random 


\ogP(6i,N,b,P(k)\Ni) = C 


(ma+bDiSd-Nif 

2NRi 




(4) 


with M the number of free voxels with non vanishing selection i?,. 
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and Pk the discrete powerspectrum of the density field. Such a pos¬ 
terior probability is too complex to analyse directly. In order to pro¬ 
vide full Bayesian uncertainty quantification the algorithm explores 
the joint posterior distribution of all these quantities via an efficient 
implementation of high dimensional Markov Chain Monte Carlo 
methods in a block sampling scheme. In particular the sampling 
consists in generating from a Wiener posterior random realizations 
of three dimensional density fields {(5/} constrained by data {A^/}. 
Following each generation, we produce conditioned random real¬ 
izations of the power-spectrum [Pk], galaxy biases [bq] and noise 
levels through several sampling steps. Iteration of these sam¬ 
pling steps correctly yields random realizations from the joint pos¬ 
terior distribution. In this fashion the ARES algorithm accounts for 
all joint and correlated uncertainties between all inferred quantities 
and allows for accurate inferences from galaxy surveys with non¬ 
trivial survey geometries. Classes of galaxies with different biases 
are treated as separate sub samples, allowing even for combined 
analyses of more than one galaxy survey. 

This methodology has also been demonstrated to correctly 
treat anti-correlations between bias amplitudes and power spec¬ 
trum, which are not taken into account in traditional approaches to 
power spectrum estimation, a 20 percent effect across large ranges 
in Fourier space ( | Jasche & Wandelt||2013b| ). In this work we use 
an upgraded version of the ARES which employs the messenger 
method discussed in [Eisner & Wandeit| ( |2013| ). This particular im¬ 
plementation of the Wiener posterior sampling has been demon¬ 
strated to improve upon the statistical efficiency of previous imple¬ 
mentations ( [Jasche & Lavaux|2015| l. In this work we use the ARES 
algorithm to infer and calibrate luminosity dependent galaxy biases 
for the 2M++ galaxy survey. 

3.2 The BORG algorithm 

In addition to ARES, this work also capitalizes on the BORG 
(Bayesian Origin Reconstruction from Galaxies [Jasche & Wandelt[ 
[2013a[ l algorithm to perform a chrono-cosmographical analysis of 
the 2M++ galaxy survey. The BORG algorithm is a fully proba¬ 
bilistic inference machinery aiming at the analysis of linear and 
mildly-non-linear matter density fields in galaxy observations. The 
algorithm incorporates a physical model for gravitational structure 
formation, which translates the traditional task of reconstructing 
the 3d density field into the task of inferring corresponding initial 
conditions at an earlier epoch from present cosmological observa¬ 
tions. This results in a highly non-trivial Bayesian inverse prob¬ 
lem, requiring to explore the very high-dimensional and non-linear 
space of possible solutions to the initial conditions problem from 
incomplete observations. These parameter spaces typically consist 
in 10^ to 10^ parameters, corresponding to the discretized volume 
elements of the observed domain. 

As for ARES, the BORG algorithm is assuming a specific data 
model to interpret the galaxy redshift catalogue and infer the three 
dimensional density field. We do not describe here the full problem 
solved by BORG as such details are already described in [Jasche &] 
[Wandelt[ ( |2013a[ ); [rasche et al?l ( [2015[ ). We remind here nonetheless 
the basic assumptions. BORG assumes that the distribution of galax¬ 
ies, after binning in volumetric elements, are Poisson distributed 
according to some expectation. This expectation, T/, of the galaxy 
distribution in the voxel i is modelled as 

Ai = NRiA(l + Sf [<5'])“, (5) 

with N the mean galaxy density, Rj the linear response operator in¬ 
cluding the effects of redshift and target completeness at the voxel 


/, A and a the bias model parameter and sf the non-linear density 
field at the voxel which functionally depends on the initial density 
field 6 ^. The power law bias model is behaving like the linear bias 
model when sf is small compared to one. In this work the relation 
between and is given by the 2LPT. As indicated above, in ad¬ 
dition to the data model, we put a Gaussian prior on the initial con¬ 
ditions, with a cosmological power spectrum. This Gaussian prior 
does not enforce Gaussianity of initial conditions. The prior only 
enforces that without access to data a Gaussian statistics should be 
followed. But intrinsically non-Gaussian defects in the data would 
not be erased under this assumption. 

Our algorithm explores the posterior distribution of the 
Fourier modes of 6^ and the meta-parameter N. As pointed out pre¬ 
viously, the 2LPT describes the one, two and three-point statistics 
correctly and represents higher-order statistics very well (see e.g. 
Moutarde et al.[[T9^ [Buchert et al.[[ 19941 [Bouchet et al.[[ 19951 

Scoccima^ [2000[ [Scoccimarro & Sheth[[2002| l. Consequently, 
the BORG algorithm naturally accounts for features of the cosmic 
web, such as filaments, that are typically associated to higher-order 
statistics induced by non-linear gravitational structure formation 
processes. Besides higher-order statistics of the density field, this 
posterior distribution also accounts for survey geometries, selec¬ 
tion effects and noise, inherent to any cosmological observation. 
The BORG algorithm provides full Bayesian uncertainty quantifica¬ 
tion by exploring this highly non-Gaussian and non-linear poste¬ 
rior distribution via an efficient Hamiltonian Markov Chain Monte 
Carlo sampling algorithm (see [Duane et al.|1987| [Jasche & Wan"^ 
[delt[2013^ for details). As it incorporates an approximate model of 
large scale dynamics, it automatically and fully self consistently in¬ 
fers the dynamical evolution of the large scale structure from obser¬ 
vations. In this fashion the algorithm provides dynamical structure 
formation histories compatible with both data and model. In order 
to account for luminosity dependent galaxy bias and to make use 
of automatic noise calibration, we will further use modifications 
introduced to the original BORG algorithm by [Jasche et al.[ ( |20T5] ). 


4 THE BAYESIAN ANALYSIS 

The analysis of the 2M++ galaxy sample has been performed on a 
cubic Cartesian domain with a side length of 600A"^ Mpc consist¬ 
ing of 256^ equidistant grid nodes, resulting in ~ 1.6x 10^ inference 
parameters for both the ARES and the BORG runs. Thus the inference 
procedure provides data constrained realizations for final (and the 
initial density fields in the case of BORG) at a grid resolution of 
about ~ 2.3 Mpc . To integrate the effect of the growth of large 
scale structure and the cosmological Doppler effects, we assume 
a fixed standard ACDM cosmology with the following set of cos¬ 
mological parameters = 0.3175, Qa = 0.6825, = 0.049, 

h = 0.6711, cTg = 0.8344, ris - 0.9624) taken from [Planck Col-[ 
[laboration| ( [2014[ ). Additionally, for the BORG runs, cosmological 
power-spectra for initial density fields were calculated following 
the prescription provided by [Eisenstein & Hu| ( [1998[ l and [Eisenstein[ 
[& Hu[ ( [T 9^ . For the ARES runs the cosmological power spectrum, 
the bias values and the mean densities have been left free. Also 
note that to guarantee a sufficient resolution of the final density 
field, we oversample the initial density field by a factor of eight, 
which requires to evaluate the 2LPT model with 512^ particles. 
The algorithm correctly accounts for the displacement of matter in 
the course of structure formation by inferring initial density fields 
at their Lagrangian coordinates, while final density fields are re¬ 
covered at corresponding final Eulerian coordinates. We note that 
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redshift space distortions are not modelled in the BORG algorithm 
and thus are not accounted for explicitly. In its present formulation 
the BORG algorithm interprets features associated to redshift distor¬ 
tions as noise and will tend to infer isotropic density fields. Isotropy 
of density fields is naturally imposed by assuming diagonal covari¬ 
ance matrices for initial density fields. Adding the treatment of red¬ 
shift distortions, both small scale and large scale, is not trivial. The 
redshift distortions on large scale induces a change in the likelihood 
where the initial conditions appears twice (in the density field and 
the way it is evaluated). An illustration of the expected important 
of such effect is given and discussed in Section [5^ The distortions 
on small scales, dubbed ”finger-of-god” (first observationally noted 
by |Jackson|1972| l, are even more complicated to model, and causes 
spreading of the mass of haloes on a large volume. This effect not 
only depends on scale but also depends on the density regime under 
consideration. As demonstrated by [Leclercq et al.| ( |2015| ) cosmic 
voids reconstructed by the BORG algorithm do not show any sign of 
redshift space distortions. With regard to reconstructed haloes tests 
on A-body simulations showed a remaining residual of 15 percent 
redshift space distortions at the high mass end. In total we gener¬ 
ated 6552 samples data constrained realizations for initial and fi¬ 
nal density fields. Generally, the computational costs to generate a 
single Markov sample are equivalent to about two hundred 2LPT 
model evaluations. We measured the typical time to produce a sin¬ 
gle sample to be about 1500 seconds on a Intel Xeon E5-4640 using 
16 cores. 


5 INFERENCE RESULTS 

This section describes inference results obtained using our 
Bayesian analysis on the 2M++ galaxy compilation. As mentioned 
in Sectionwe cannot run a single code to do the entire analysis. 
Even though that it is mathematically possible, the time complex¬ 
ity would be too high to obtain results in a timely fashion. So we 
rely on a splitted analysis, using an approximate statistical model 
(ARES) to derive some of the meta parameters that will be used in 
the advanced model (BORG). We first present the relevant results of 
the analysis using the ARES code in Section [5T] Then we describe 
the 3d density field obtained by the BORG code in Section [5l^ along 
with its convergence properties. In Section [53] we present the cos¬ 
mography of the final density field as inferred by BORG. Einally, in 
Section [5^ we give a quantitative assessment of the presence of 
the Local Void behind Milky Way’s galactic bulge. 


5.1 Initialization analysis with ARES 

As described above, in this work we will use the ARES code to cali¬ 
brate unknown luminosity dependent galaxy biases followed by an 
detailed analysis with the BORG algorithm. To perform this initial 
analysis with ARES we will follow a similar approach as described 
in |Jasche & Wandelt| ( |2013b| ). Specifically we will treat galaxies se¬ 
lected at K 2 M++ < 11-5 (sample 1) and 11.5 < K 2 M++ < 12.5 (sam¬ 
ple 2) as two independent data sets with their respective survey ge¬ 
ometry and selection function, as detailed in section]^ In addition 
we sub divide each of these galaxy samples into three bins of abso¬ 
lute magnitude in the range -25.00 < < -21.00 to account 

for respective luminosity dependent galaxy biases and noise levels. 
When applied to the 2M++ data, the ARES code generated 4306 
joint posterior realizations for the cosmic power-spectrum, the den¬ 


sity field, noise levels and luminosity dependent galaxy biases QT o 
demonstrate that the ARES algorithm inference yielded physically 
correct results, in Eigurej^we show the comparison between the in¬ 
ferred ensemble mean cosmological power-spectrum and a fiducial 
one, calculated according to the prescription described in |Eisen-| 
[stein & Hu| ( |1998| ) and [Eisenstein & Hu| ( |1999| ). As can be seen 
ARES has recovered the shape of the cosmological power-spectrum 
within the corresponding one sigma confidence regions. No par¬ 
ticular sign of bias throughout all modes in Eourier space can be 
observed. Erroneous treatment of survey geometries, selection ef¬ 
fects and galaxy biases typically yield artefacts of false power in the 
power-spectrum. The absence of such artefacts in Eigurej^ there¬ 
fore indicates that these effects have been accounted for accurately. 

In Figure we show the value for the bias parameter found 
in the different subsample, taking the faintest magnitude bin of the 
sample 2 with a fiducial value of one. The result are given in red 
and blue coloured boxes. The width of those boxes corresponds 
to the width of the magnitude interval and their height to the 95% 
confidence interval. In addition, the best fit of lWestov^ ( |2007| l have 
been plotted in black, alongside its error bar analysis. The best fit 
of I Westov"^ ( |2007| ) is given by 

4 = 0.73 + 0.07 + (0.24 + 0.04) — (6) 

with L the intrinsic luminosity of the considered galaxy population, 
L* the reference luminosity which for 2M-I--I- is given by - 
-23.25. We have adjusted the reference so that a bias of one is 
given for our reference population (sample 2, faint luminosity bin). 
We note the perfect agreement between the two measurement. The 
advantage of our procedure is its full automation, the derivation 
of an unbiased power spectrum and the alongside matter density 
field. Also, we have used a limited number of bins, but nothing 
prevents us to increase their number, at the cost of the amplitude of 
the signal-to-noise. The most important result of the ARES analysis 
for this work is the derivation of the luminosity-dependent galaxy 
biases for the galaxy population selected in 2M++. We use these 
biases as-is in the following BORG reconstruction. While the two 
bias model are relatively different, in the regime of small density 
fluctuations on large scales, they can be rejoined by doing a Taylor 
expansion: (1 -h ^nl)“ - 1 + ct^nl and thus b ^ a. Of course this 
equality is not exact and is probably leading to some bias in the 
density field reconstruction. We expect in the future to be able to 
jointly infer the bias parameter in BORG with the density field itself 
at lesser computational cost, which will remove any foreseeable 
problem. 

5.2 3d density field 

Using inferred bias values, as described above, we have run the 
BORG algorithm on the 2M++ compilation data. The results are pre¬ 
sented in Figures 1^1^ and [7] 

In Figure]^ we show the sequence of power-spectra of the ini¬ 
tial density field as the chain is attached to a locus around the max¬ 
imum posterior. The top panel shows the raw power spectra and the 
bottom panel are the same power-spectra divided by the assumed 
ACDM initial linear power-spectrum. We note that after a conver¬ 
gence in ~ 400 samples, the power spectra starts oscillating on large 
scales (k < 0.1 A Mpc"^). This indicates the chain has extracted all 

^ ARES run has been done on a standard workstation Intel Core i7-2600, 8 
cores, in a week. 
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Sample selection 


Identifier 


N( 

12.5 < X 2 M++ < 12.5 and 

-25.00 < 

-23.67 < 

-22.33 < 

< -23.67 

< -22.33 

< -21.00 

0 

1 

2 

1.74 

1.21 

1.00 

(1.04 ± 0.01) X 10-2 
(8.6 ± 0.1) X 10-2 
(1.37 ± 0.03) X 10-‘ 

K 2 M++ ^11.5 and 

-25.00 < 

-23.67 < 

-22.33 < 

< -23.67 

< -22.33 

< -21.00 

3 

4 

5 

1.70 

1.20 

1.15 

(1.12 ± 0.01) X 10-2 
(8.60 ± 0.07) X 10-2 
(1.24 ± 0.02) X 10-‘ 


Table 1. Bias parameters corresponding to the power-law bias model, as described in the text, for six galaxy sub-samples, subdivided according both to their 
absolute i^ 2 M++-band magnitudes and their apparent magnitudes. The sub-sample 2 is taken as fiducial with a bias set to one. 



Figure 2. Power-spectrum measured by ARES. We show here the local of 
the maximum posterior for each bin in k space (thick red line), alongside the 
95% probability volume (filled grey area) of the power spectrum as mea¬ 
sured by ARES with the 2M-I--I-. The reference power spectrum computed 
using the |Eisenstein & Hu|(1998|[l999^ approximation including wiggles 
contributions for the cosmology given in Section|^ 


Figure 3. Bias values from ARES: We show here the bias values inferred 
from the 2M-I-I- catalogue assuming a fiducial bias of one for the sub-sample 
2 of Table[T](11.5 < K 2 M++ < 12.5, red). In addition we have the overplot¬ 
ted the best fit of |Westo\^ j2007) readjusted for the magnitude bin that 
serves us as a reference (black). The width of the box gives the interval size 
of the magnitude bin. Their height gives the 95% confidence limit of the 
measurement on bias. 


the available information at these scales from observations. Addi¬ 
tionally this indicates the correlation length of the Markov chain to 
be on the order of ~ 400 sampling steps. On intermediate scales 
(0.1/t Mpc"^< k <2h Mpc"^) the power-spectrum is strongly con¬ 
strained and unbiased compared to our reference power-spectrum. 
At very small scale the noise increases back again because we reach 
scales at most of the size of a voxel element. Consequently all infor¬ 
mation is lost. We note that, contrary to |Kitaur^ < |2013| ), we do not 
observe any bumps in the power-spectra of reconstructed phases 
at intermediate scales. Finally we handle unobserved regions suffi¬ 
ciently correctly that the power spectra appear unbiased. 

In Figure]^ we show the mean initial density field (top row), 
the 2LPT evolved mean final density field (middle row) and the in¬ 
put data (bottom row) for the X, Y and Z plane of the Equatorial 
coordinate system. The edge of the 2M-I--I- survey is clearly visible 
in the mean final density field. For these panels, we see clearly de¬ 
fined structures in the central region, which is close to the observer 
and more likely to be fully complete. Towards the boundaries of 
the cubic domain structures become increasingly blurry when go¬ 
ing out of the observed volume at a distance of ~200/i"^ Mpc from 
the centre. In the initial condition (top row), these edges are far 
less clear which emphasizes that the information stored in the cur¬ 
rent position of galaxies comes from extended places in Lagrangian 


coordinates and that information is distributed differently in initial 
and final conditions ( |Jasche et al.|2015| ). Finally, we see the visual 
improvement obtained from the final density field derived by BORG 
compared to the actual distribution of galaxies given in the bottom 
row. 

In Figurej^ we show the impact, a posteriori, of the large scale 
component of redshift space distortions. In particular/or this test 
we assume that inferred density fields have been correctly recov¬ 
ered in real-space and add redshift space distortions corresponding 
to velocities derived through 2LPT dynamics. The left-hand panel 
of Figurej^reproduces the real space density field of Figurej^ (cen¬ 
tre column) as determined by BORG. The middle panel shows the 
redshift distortion effects produced by peculiar velocities predicted 
by the 2LPT dynamics on the density field. The right-hand panel 
gives the difference between the middle and the left-hand panel, 
highlighting the regions that have moved due to redshift distortions. 
On top of the three density fields, we have drawn a red dash-dotted 
grid with a spacing of 50/t"^ Mpc . As can be seen Large Scale 
Structures are not moved much by the large scale component of the 
peculiar velocities. The most important effects lead to smearing of 
filaments and haloes (middle panel), which already happens when 
comparing 2LPT dynamics to full non-linear solution since 2LPT 
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Figure 5. Three slices from different directions through the three dimensional ensemble posterior means for the initial (upper panels) and final density fields 
(middle panels) estimated from 6552 samples. The lower panels depict corresponding slices through the galaxy number counts field of the SDSS main sample. 
The direction are respectively along the equatorial plane x = 0 (left-hand column), y = 0 (middle column) and z = 0 (right-hand column). 


does not capture shell crossing effects very well. Inspection of the 
right-hand panel indicates that structures move typically by a few 
Mpc, which is of the same order as grid resolution used here (~ 
23h~^ Mpc ). Thus, on scales larger than a single voxel size in¬ 
ferred density fields are not affected much by this effect. We can 
conclude that for the purpose of density reconstruction that the 
fields predicted by BORG are very close to what they should be if 
redshift space distortions were taken into account. 


5.3 Cosmography 

In Figure we show the supergalactic plane as seen from a thin 
slice of the final density field (coloured background field) com¬ 
puted by BORG and a 20h~^ Mpc -thick slice {20h~^ Mpc) extracted 
directly from the galaxy data (magenta dots). We have represented 


the data in polar coordinates so that the Supergalactic longitude can 
be directly read from the plot. 


Major structures of the Local Universe are clearly visible both 
with the galaxies and the final density field. Also, the density field 
in the Galactic plane (visible at L = 0° and L = 180°) is smoothly 
extrapolated from neighbouring structures. We typically s ee the 
Pisces-Cetus supercluster (L ~ 305°, d ~I80/i"^ Mpc ; Tully 
I986|), the Coma cluster (L ~ 90°, d VOr^ Mpc ; |Wolf|[l^ 


Hubble & Humason|I93I|), the Shapley concentration (L - 149°, 


d ~ I40/i"^ Mpc ; Scaramella et al.|p^89 Raychaudhury || 198^ 

and the Perseus-Pisces supercluster (L ~ 343, d ~55/i"^ Mpc ; 
|Joeveer et al.|I978T l. We note that a quite prominent circular fila¬ 
ment connected to the Shapley concentration, going from L ~ 100° 
to L ~ 150° at J ~ I40/t"^ Mpc , located just behind the Bootes 
void. We are not aware of any name given to this filament, we name 
it the Virgo-Bootes-Hercules filament. 
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Figure 6. Impact of the large scale component of redshift space distortions on the reconstructed density field. The two panels show the ensemble mean final 
density fields of the central equatorial slice at y = 0/?“^ Mpc . The left-hand panel gives the real space density field, while the middle panel gives the density 
field obtained by applying redshift space distortions assuming 2LPT dynamics. The right-hand panel gives the density difference between the middle panel 
and the left-hand panel in the same slice. The horizontal and vertical dashed red lines have been drawn to allow for an easier comparison between the two 
density fields. 



k (h Mpc ^) 


Figure 4. We are showing here the burn-in phase of the power spectrum. 
The top panel shows the power spectrum itself, coloured according to the 
identifier of the step along the Markov Chain. The bottom panel are the 
same power spectra after having divided by the assumed ACDM initial lin¬ 
ear power-spectrum. 


As a final remark, we note that the Sloan Great Wall is clearly 
visible in the reconstructed density field shown in the middle right- 
hand panel of Figure]^ at x ~ 225h~^ Mpc , y ~ 0h~^ Mpc . The 
wall itself is not clearly visible in the galaxy distribution shown in 
the panel just below. We see that the Sloan Great Wall is not as well 
characterized as other structures by looking at the amplitude of the 
mean field, which is expected given the sparsity of galaxies in the 
catalogue in that part of the volume. This structure is a striking ex¬ 
ample of the large-scale structure reconstruction achieved by BORG 
from noisy data. By representing the galaxies and the reconstructed 


90° 



270° 

Supergalactic L 


^ I _^^ ^ _I 

-1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 

Figure 7. Supergalactic plane, thickness is 20h~^ Mpc for galaxies (shown 
in magenta points), density is smoothed with a Ih~^ Mpc Gaussian kernel 
(background field with colour scale indicated below the panel). 

Sloan Great Wall on the same sky plot, we see that the Hercules- 
Aries filament inters 


5.4 Local Void analysis 

An interesting feature of non-linear density fields inferred by BORG 
is the possibility to uncover unobserved structures. In Figurewe 
provide a particular example by looking at the Local Void (also 
known as Tally’s void, |Tully & Fisher|p^87] l. In the two panels 
we show the mean ’’final density field” and overplotted by either 
the 2M++ galaxies (left-hand panel) or the HI Parkes All Sky Sur¬ 
vey (HIPASS) galaxies (right-hand panel, [Meyer et al.|2004| l. The 
2M++ galaxies are appearing in spite of the galactic plane cut and 
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Figure 8. Galactic plane b = 0, 2M++ left, HICAT right, thickness is 40h ^ Mpc for galaxy representation, the density is smoothed with a Ih ^ Mpc Gaussian 
kernel. 


the galactic bulge because we represent a 40h~^ Mpc thick slice. 
This void is clearly visible at the Galactic longitude / ~ 15° in 
both panels and it visually seems to extend from 10h~^ Mpc to 
60h~^ Mpc in the ensemble mean field. 

To illustrate a further application of our reconstruction tech¬ 
nique, we identify and assign a probabilistic value to belonging 
in a DivA{ |Lavaux & Wandelt||2010| | void for voxels located in the 
galactic plane. We have used the following procedure. 

First we smooth the initial density field of each sample of the 
Markov Chain created by BORG with a Gaussian filter of 5h~^ Mpc . 
The choice of this filter size is motivated by the mass it corresponds 
to in Lagrangian coordinates. For a Universe with Hm = 0.30, a 
tophat filter of 5h~^ Mpc would represent -4x10^^ M©. So filter¬ 
ing over that scale removes the contribution from groups of galaxies 
in the classification of the cosmic web. 

Then we run the truncated watershed transform on this field. 
We identify particles belonging to the identified voids and propa¬ 
gate forward in time using 2LPT. We set to one each voxel where 
a void particle is found, and we compute the average field. By con¬ 
struction the average field becomes the marginalized probability for 
each voxel to be in a void: 

1 c ^ 

fp-rZ r = f;({S,})P(6ma)d% 

= P(p is in a void|data), (7) 

where C is the length of the Markov Chain, /J({^^}) is set to one if 
the voxel p belongs to a void assuming initial density fiuctuations 
{b^} and zero otherwise, P(^|data) the conditional marginalized pos¬ 
terior of the reconstructed initial density fluctuations given the data. 
The mean field is thus equal to the probability thatis in a void 
given the observational data. 

We show the result of this procedure in the Figure]^ high¬ 
lighting the regions definitely voids (dark blue colour) or not voids 
(white). We have over-plotted the galaxies of the HIPASS catalogue 
that are within Mpc of the galactic plane. Of course the re¬ 
gions with a large number of galaxies are more clearly not voids. 
On the other hand there is a filament of galaxies at / ~ 60° that is 
marked as belonging to a void with a high probability, i.e. greater 
than 90%. We note that the void classification probability is en¬ 
tirely marginalized according to all the other variables. The classi- 


^ (Galactic) 



0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 


Figure 9. We show here the probability of a voxel of the galactic plane 
to belong to a void. The probability goes from ’’certainly void” (dark blue) 
to ’’certainly non-void” (white). We have overplotted the galaxies from the 
HIPASS survey in magenta shade. The galaxies have been selected such 
that they are within 10h~^ Mpc from the galactic plane. 


fication here corresponds qualitatively well with the visual impres¬ 
sion of Figure for which the void-like area located in the most 
under-dense region at longitudes between ~ 0° and ~ 30°. Most of 
the voxels to the right of 0° are identified as non-void. Of course 
this classification is not the full story, and it has been advocated by 
[Lavaux & Wandelt] p010| ) that one should use a full filtering hi¬ 
erarchy to characterize dynamically the cosmic web. It is however 
a powerful tool to separate the galaxies according to their dynam¬ 
ical environment. As it would be beyond the scope of this paper, 
we postpone this classification to a future work. We also note that 
the DIVA classification of the Large Scale structure is not unique 
as other prescriptions have been advocated in other work that rely 
only on the present gravitational field (such as |Hahn et al.|2007| l. 
However the combination of BORG and diva allows us to use the 
full dynamical history of Large Scale structures to make the classi¬ 
fication. Contrary to other techniques, it accounts for the fact that 
galaxies may have originally formed in environments different from 
their present one. 
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6 SUMMARY AND CONCLUSIONS 

This work presents a fully Bayesian data analysis pipeline to study 
cosmic structures in galaxy redshift catalogues, derive their statis¬ 
tical properties and infer corresponding initial conditions as well as 
plausible dynamic structure formation histories. This pipeline con¬ 
sists in the sequential application of two of our Bayesian inference 
algorithms. 

Specifically, here we have applied this methodology to the 
2M++ galaxy compilation ( [Lavaux & Hudson] |2011| t, spanning 
the entire sky at a depth of ~ 200h~^ Mpc . In a first step we 
have employed the ARES ( j Jasche & Wandelt]|2013bj ) algorithm 
to infer the cosmological power-spectrum and calibrate luminos¬ 
ity dependent galaxy biases. As demonstrated in Section HD the 
ARES algorithm accurately recovers the shape of a fiducial cos¬ 
mological power-spectrum throughout the entire range of Fourier 
modes considered in this work. This result clearly demonstrates 
that systematics arising from survey geometries, selection effects 
and galaxy biases have been accounted for in our Bayesian infer¬ 
ence approach. In particular, we have determined the bias values of 
galaxies with luminosities in three bins for magnitudes going from 
K 2 M++ = - 25.00 to K 2 M++ = - 21.00. We note that our results on 
luminosity dependent galaxy biases are consistent with and confirm 
the previous findings of |Westover| ( [2007] ). 

Based upon these results we performed a highly detailed anal¬ 
ysis of the mildly non-linear and non-linear large scale structure 
in the 2M++ galaxy catalogue via the BORG algorithm ( [Jasche &] 
[Wandelt|2013a[ l. Specifically, we have used the previously inferred 
galaxy biases as an input to BORG to infer the large scale structure 
of the Nearby Universe within a co-moving equidistant box of a 
volume of (600A“^ Mpc centred on the observer. The grid reso¬ 
lution is ~ 2.3 h~^ Mpc , resulting in a total of ~ 1.6 x 10^ infer¬ 
ence parameters which can be accurately handled by our Bayesian 
inference framework. The algorithm jointly infers the present non¬ 
linear Large Scale structures and their corresponding initial condi¬ 
tions, at a cosmic scale factor of a ~ 10“^, from which they orig¬ 
inate. In Section [ 5 ^ we have demonstrated the results for inferred 
three dimensional density fields. These results show highly detailed 
Large Scale structures at present and in initial conditions. Further 
we have shown that our Bayesian inference algorithm permits us 
to accurately quantify uncertainties inherent to any cosmological 
observations. We have thus successfully reconstructed statistically 
the initial conditions on large scales of our Local Universe together 
with a detailed treatment of survey geometries, selection effects and 
tracer biases. 

As a particular application of the reconstructed density field 
and initial conditions to statistical structure detection, we have fo¬ 
cused on the problem of identifying the Local Void. The Local Void 
is typically obscured by the Galaxy and is consequently masked 
out in the 2M++ galaxy compilation. To demonstrate the power of 
our Bayesian methodology to recover structures in unobserved re¬ 
gions we have shown that the Local Void is clearly visible in the 
reconstructed density field at z = 0 despite the lack of information. 
To further quantify the statistical significance of this detection, we 
have used the diva void classification prescription to generate a den¬ 
sity of probability that a given volume element is part of the Local 
Void. These results indicate a high probability for the existence of 
the Local Void behind the Galaxy. The validity of our results is 
further supported by comparison with data from the HIPASS cata¬ 
logue. 

The results obtained in this work will be subject to more 
detailed studies, including further improvement in the dynamical 


model used in the BORG tool, of the large scale structure in the 
Nearby Universe. 

In summary, this work presents a detailed application of our 
Bayesian inference framework to data of the 2M++ galaxy cata¬ 
logue. In contrast to state-of-the-art approaches, our algorithm ac¬ 
curately recovers structures in noisy and masked regimes and also 
infers the dynamic formation history of individual large scale struc¬ 
tures. As a result this methodology opens new windows to analyse 
and understand the Large Scale structures of our Universe. 
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